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Popularity is attractive ..l- — this is the formula underlying preferential attachment [2j, a popular 
explanation for the emergence of scaling in growing networks. If new connections are made prefer- 
entially to more popular nodes, then the resulting distribution of the number of connections that 
nodes have follows power laws [31 S] observed in many real networks [S] |B] . Preferential attach- 
ment has been directly validated for some real networks, including the Internet [71[8|. Preferential 
attachment can also be a consequence of different underlying processes based on node fitness, rank- 
ing, optimization, random walks, or duplication f5^-'16' . Here we show that popularity is just one 
dimension of attractiveness. Another dimension is similarity. We develop a framework where new 
connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and 
similarity. The framework admits a geometric interpretation, in which popularity preference emerges 
from local optimization. As opposed to preferential attachment, the optimization framework accu- 
rately describes large-scale evolution of technological (Internet), social (web of trust), and biological 
{E.coli metabolic) networks, predicting the probability of new links in them with a remarkable pre- 
cision. The developed framework can thus be used for predicting new links in evolving networks, 
and provides a different perspective on preferential attachment as an emergent phenomenon. 

More similar nodes have higher chances to get connected even if they are not popular. This effect is known as 
homophily in social sciences [T7| (TB] , and it has been observed in many real networks [TM^ . In the Web [531 1211 j 
for example, an individual creating her new homepage tends to link it not only to popular sites such as Google or 
Facebook, but also to not so popular sites that are close to her special interests, e.g., Tartini or free soloing. These 
observations suggest to introduce a measure of attractiveness which would somehow balance popularity and similarity. 

The simplest proxy to popularity is the node birth time. All other things equal, older nodes have more chances 
to become popular and attract connections [31 [4] . If nodes join the network one by one, then the node birth time 
is simply the node number t = 1,2, . . .. To model similarity, we randomly place nodes on a circle abstracting the 
simplest similarity space. That is, the angular distances between nodes model their similarity distances, such as the 
cosine similarity or any other measure |22H24| . The simplest way to model a balance between popularity and similarity 
is then to establish new connections optimizing the product between popularity and similarity. In other words, the 
model is simply: (1) initially the network is empty; (2) at time t > 1, new node t appears at a random angular 
position 9t on the circle; and (3) connects to a subset of existing nodes s, s < t, consisting of the m nodes with the 
m smallest values of product sOgti where m is a parameter controlling the average node degree k = 2m, and 9st is the 
angular distance between nodes s and t (Fig. l(a,b)). At early times t < m, node t connects to all the existing nodes. 

This model finds an interesting geometric interpretation, shown in Fig. 1(c). Specifically, after mapping birth time 
t of a node to its radial coordinate rt via rt = Int, all nodes lie not on a circle but on a plane — their polar coordinates 
are (jt, 9t). It then turns out that new nodes connect simply to the closest m nodes on the plane, except that distances 
are not Euclidean but hyperbolic |25j . The hyperbolic distance between two nodes at polar coordinates {rs,Os) and 
(rt,9t) is approximately x^t = fs + ft + lii(^st/2) = ln(st0st/2). Therefore the sets of nodes s minimizing x^t or s9st 
for each t are identical. The hyperbolic distance is then nothing but a convenient single-metric representation of a 
combination of the two attractiveness attributes, radial popularity and angular similarity. We will use this metric 
extensively below. 

The networks grown as described may seem to have nothing in common with preferential attachment (PA) [2H1]- 
Yet we show in Fig. 2(a) that the probability n(fc) that an existing node of degree k attracts a connection from a 
new node is the same linear function of k in the described model and in PA. It is not surprising then that the degree 
distributions in PA and our model are the same power laws. In Section |IV| we prove that the exponent 7 of this 
power law approaches 2. Preferential attachment thus emerges as an effective process originating from optimization 
trade-offs between popularity and similarity. 

However, there are crucial differences between such optimization and PA. In the latter, new nodes connect with 
the same probability n(/c) to any nodes of degree k in the network. In the former, new nodes connect only to 
specific subsets of such fc-degree nodes that are closest to the new node along the similarity dimension 9 (Fig. 1(c)). 
To quantify, we compare in Fig. 2(b) the probability of connection between a pair of nodes as a function of their 
hyperbolic distance in the two cases. We see that close nodes are almost always connected in the optimization model, 
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FIG. 1: Geometric interpretation of popularity x similarity optimization. The nodes are numbered by their birth times, and 
located at random angular (similarity) coordinates. Upon its birth, the new circled node t connects to m old nodes s minimizing 
sOst- The new connections are shown by the thicker blue links. In (a,b) t = 3 and m = 1. In (a) node 3 connects to node 2 
because 2023 ~ 2n/3 < I613 — 5n/6. In (b) node 3 connects to node 1 because 1^13 = 27r/3 < 2^23 = tt. In (c) an optimization- 
driven network with m = 3 is simulated for up to 20 nodes. The radial (popularity) coordinate of new node t = 20 is rt — Int, 
and the node connects to the three hyperbolically closest nodes. The red shape marks the set of points located at hyperbolic 
distances less than rt from the new node. All nodes drift away from the crossed origin, emulating popularity fading as explained 
in the text. The drift speed in the shown network corresponds to the degree distribution exponent 7 — 2.1. 
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FIG. 2: Emergence of preferential attachment from popularity x similarity optimization. Two growing networks have been 
simulated up to t = 10^ nodes, one growing according to the described optimization model, and the other according to PA. In 
both networks each new node connects to m = 2 existing nodes. The 7 — >■ 2 limit is not well-defined in PA, so that 7 = 2.1 
is used instead as described in the text. Plot (a) shows the probability n(fc) that an existing node of degree k attracts a new 
link. The solid line is the theoretical prediction. Plot (b) shows the probability p{x) that a pair of nodes located at hyperbolic 
distance x are connected. The average clustering (over all nodes) in the optimization and PA networks is c = 0.83 and c — 0.12, 
respectively. 



while in PA the probability of their connections is lower by an order of magnitude. On the other hand, far apart 
nodes are never connected in the optimization model, as opposed to PA. These differences manifest themselves in the 
strength of clustering, which is the probability that two neighbors of the same node are connected. In PA, clustering 
is asymptotically zero [26] , while it is strong in many real networks jSJ |6] . We show in Section IV that the described 
optimization model leads to clustering that is strongest possible for networks with a given average degree and degree 
distribution. 

Clustering and the power-law exponent can both be adjusted to arbitrary values via the following model modifi- 
cations. We first consider the effect of popularity fading, observed in many real networks We note that the 
closer the node to the center in Fig. 1(c), the more popular it is — the more new connections it attracts, and the higher 
its degree — providing the intuition behind the emergence of preferential attachment. Therefore to model popularity 
fading, we let all nodes drift away from the center such that the radial coordinate of node s at time t > s is increasing 
rs(t) = (3rg -|- (1 — /3)rt, where = Ins and — Int, and parameter /3 S [0,1]. This modification is identical to 
minimizing s'^Ost (or with /3 = b/a) instead of s6st- It changes the power-law exponent to 7 = 1 + 1//? > 2. If 
/9 = 1, the nodes do not move and 7 = 2. If /3 = 0, all nodes move with the maximum speed, always lying on the 
circle of radius rj, while the network degenerates to a random geometric graph growing on the circle. PA emerges at 
any 7 = 1 -|- 1//3 since the attraction probability Il{k) is a linear function of degree k, Il{k) ^ k + 771(7 ~ 2), the same 
as in PA [3] . We prove these statements in Sections |IV[ VII where we also show that the popular fitness model \TU\ 
can be mapped to our geometric optimization framework by letting different nodes drift away with different speeds 
(Section [Vf. 

Since strongest clustering is due to connections to the closest nodes, to weaken clustering we allow connections 
to farther nodes. Connecting to the m closest nodes is approximately the same as connecting to nodes lying within 
distance Rt ^ rt, see Fig. 1(c) and Section IV where we derive the exact expression for Rt fixing the average degree 
in the network. If new nodes t establish connections to existing nodes s with probability p{xst) — + e^^"*"''^'-'/^], 
where parameter T > is the network temperature and Xst is the hyperbolic distance between nodes s and t, then 
clustering is a decreasing function of temperature. That is, temperature is the parameter controlling clustering in the 
network. At zero temperature, the connection probability p{xst) is either 1 or depending on whether distance Xst 
is less or greater than Rf, so that we recover the strongest clustering case above, where new nodes connect only to 
the closest existing nodes. Clustering gradually decreases to zero at T = 1, and remains asymptotically zero for any 



T > 1 (Sections IV 



VI). At high temperatures T — > 00 the model degenerates either to growing random graphs, or. 



remarkably, to standard PA (Section |VII ). 

To investigate if similarity shapes the structure and dynamics of real networks as our model predicts, we consider a 
series of historical snapshots of the Internet, E.coli metabolic network, and the web of trust between people. The first 
two networks are disassortative, while the third is assortative, and its degree distribution deviates from power laws. 
We map these networks to their popularity x similarity spaces (Methods Summary). The mapping infers the radial 
(popularity) and angular (similarity) coordinates for all nodes, so that we can compute the hyperbolic distances 
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FIG. 3; Popularity X similarity optimization in the growing Internet (plot (a)), E.coli metabolic network (plot (b)), and Pretty- 
Good-Privacy (PGP) web of trust between people (plot (c)). Each plot shows the probability of connections between new and 
old nodes, as a function of the hyperbolic (popularity x similarity) distance x between them in the real networks (circles and 
squares) and in PA emulations (diamonds and triangles). To emulate PA, new links are disconnected from old nodes to which 
these links are connected in the real networks, and reconnected to old nodes according to PA. For a pair of historical network 
snapshots So (older) and 5*1 (newer), new nodes are the nodes present in Si but not in So, and old nodes are the nodes present 
both in Si and So. Each plot shows the data for two pairs of such historical snapshots. The solid curve in each plot is the 
theoretical connection probability in the optimization model with the parameters corresponding to a given real network. Since 
the probability of new connections in the real networks is close to the theoretical curves, the shown data demonstrate that 
these networks grow as the popularity x similarity optimization model predicts, while PA, accounting only for popularity, is off 
by orders of magnitude in predicting the connections between similar (small x) or dissimilar (large x) nodes. To quantify this 
inaccuracy, the insets show the ratio between the connection probabilities in PA emulations and in the real networks, i.e., the 
ratios of the values shown by diamonds and circles, and by triangles and squares in the main plots. The x-axes in the insets 
are the same as in the main plots. 

between all node pairs, and the probability of new connections as a function of the hyperbolic distance between 
corresponding nodes. This probability is shown in Fig. 3. It is close to the theoretical prediction by the model. 

This finding is important for several reasons. First, it shows that real-world networks evolve as our framework 
predicts. Specifically, given the popularity and similarity coordinates of two nodes, they link with probability close 
to the theoretical in the model. The framework may thus be used for link prediction, a notoriously difficult and 
important problem in many disciplines [2 9) , with applications ranging from predicting protein interactions or terrorist 
connections to designing recommender and collaborative filtering systems |30j . Second, Fig. 3 directly validates our 
framework and its core mechanism. It is not surprising then that, as a consequence, the synthetic graphs that the model 
generates are remarkably similar to real networks across a range of metrics (Section |IX[ ), implying that the framework 
can be also used for veracious modeling of real network topologies. We review related work in Section [Xj and to the 
best of our knowledge, there is no model that would simultaneously: (1) be simple and universal, i.e., applicable to 
many different networks, (2) have a similarity space as its core component, (3) cast PA as an emergent phenomenon, 
(4) generate graphs similar to real networks across a wide range of metrics, and (5) validate the proposed growth 
mechanism directly. Validation is usually limited to comparing certain graph metrics, such as degree distribution, 
between modeled and real networks, which "validates" a consequence of the mechanism, not the mechanism itself. 
Direct validation is usually difficult because proposed mechanisms tend to incorporate many unmeasurablc factors — 
economic or political factors in Internet evolution, for example. We cannot measure all the factors or node attributes 
contributing to node similarity in any of the considered real networks either. Yet, the angular distances between 
nodes in our approach can be considered as projections of properly weighted combinations of all such similarity 
factors affecting network evolution, and we can infer these distances using statistical inference methods, directly 
validating the growth mechanism. 

To summarize, popularity is attractive, but so is similarity. Neglecting the latter would lead to severe aberrations. 
In the Internet, for example, a local network in Nebraska would connect directly to a local network in Tibet, the 
same way as in the Web, a person not even knowing Tartini or free soloing would suddenly link her page to these 
subjects. The probability of such dissimilar connections is very low in reality, and the stronger the similarity forces, the 
smaller this probability is. Neglecting the network similarity structure leads to overestimations or underestimations 
of the probability of dissimilar or similar connections by orders of magnitude (Fig. 3). However, one cannot tell the 
difference with preferential attachment by examining node degrees only. The probability that an existing node of 
degree k attracts a new link optimizing popularity x similarity is exactly the same linear function of k as in preferential 
attachment (Fig. 2(a)). Figure SI shows that this function is indeed realized in the considered real networks, re- 
validating effective preferential attachment for these networks. Therefore the popularity x similarity optimization 
approach provides a natural geometric explanation for the following "dilemmas" with preferential attachment. On 
one hand, preferential attachment has been validated for many real networks, while on the other hand, it requires 
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exogenous mechanisms to explain not only strong clustering, but also linear popularity preference, and how such 
preference can emerge in real networks, where nodes do not have any global information about the network structure. 
Since preferential attachment appears as an emergent phenomenon in the framework developed here, this framework 
provides a simple and natural resolution to these dilemmas, and this resolution is directly validated against large-scale 
evolution of drastically different real networks. We conclude with the observation that the knowledge of exactly the 
closest nodes in the hyperbolic popularity x similarity space does require the precise global information about all node 
locations. However, non-zero temperatures smooth out the sharp connectivity perimeter threshold in Fig. 1(c), thus 
modeling reality where this proximity information is not precise and mixed with errors and noise. In that respect, 
preferential attachment is a limiting regime with similarity forces reduced to nothing but noise. 



Methods Summary 



To infer the radial and angular 6i coordinates for each node i in a real network snapshot with adjacency matrix Oy-, 
i,j = l,2,...,t, we use the Markov Chain Monte Carlo (MCMC) method described in detail in Section|ll] Specifically, 
we derive there the exact relation between the expected current degree ki of node i and its current radial coordinate r^, 
which scales as ki ~ e'"'^' ' . To infer the radial coordinates we use the same expression substituting in it the real degrees 
ki of nodes instead of their expected degrees. Having the radial coordinates inferred, we then execute the Metropolis- 
Hastings algorithm to find the node angular coordinates that maximize likelihood C = Y[i<j Pi^ij)'^^' ~ Pi^ij)]^~"'^' ^ 
where p{xij) = l/[l+e(^'^~^)/"^] is the connection probability in the model, and parameters i? and T are defined by the 
average node degree and clustering in the network via expressions in SectionjlV] Likelihood C is the probability that the 
network snapshot with node coordinates {ri,6i), defining the hyperbolic distances Xij between all nodes, is produced 
by the model. The algorithm employs an MCMC process which finds coordinates 9i for all i that approximately 
maximize C. Further details are in Sections [lTl|nH where we also show that the method yields meaningful results for the 
considered networks, but not for a network (movie actor collaborations) to which popularity x similarity optimization 
does not apply. 

In Fig. 3, the nodes in plots (a), (b), and (c) are Autonomous Systems (ASs), metabolites, and PGP certificates 
of people. Parameters (i?, T) used to infer the coordinates and to draw the theoretical connection probability are 
(25.2,0.79), (14.4,0.77), and (23,0.59). Each plot shows data for two pairs of snapshots: plot (a) January, April, 
2007, and April, June, 2009; plot (b) So, Si, and Si, S2 defined in Section|lj and plot (c) April, October, 2003, and 
December 2005, December 2006. Few missing data points in the empirical curves (circles and squares) indicate that 
there are no node pairs at the corresponding distances after the mapping, whereas extra missing points in the PA 
emulation curves (diamonds and triangles) indicate that all node pairs at those distances are not connected after PA 
emulations, meaning that the PA connection probability is zero there. 



I. REAL- WORLD NETWORKS 



Here we provide details on the real- world network data used to validate the popularity x similarity optimization 
approach. We have considered the AS Internet, the E.coli metabolic network, and the web of trust among people 
extracted from Pretty-Good-Privacy (PGP) data. That is, we have validated our approach against three paradigmatic 
real networks, from three different domains — technology, biology, and society. 



A. Internet 



The Internet data used in Fig. 3(a) and Fig. Sll of Section IX is collected and prepared as follows. First, we 
obtain 11 lists of all the autonomous systems (ASs) observed in a collection of Border Gateway Protocol (BGP) data 
exactly as described in [311. These AS lists are linearly spaced in time with the interval of three months: time t = 
corresponds to January 2007, t = 1 is April 2007, and so on until t — 10, June 2009. We denote the obtained AS lists 
by Lt- For any pair of t and t' > t, we call the ASs present both in Lt and Lt' the old ASs, and the ASs present only 
in Lt' but not in Lt are called the new ASs. The number of ASs in Lq is 17258, while the numbers of new ASs in 
Lf with i' = 1, 2, . . . , 10 compared to t = are 806, 1614, 2389, 3103, 3973, 4794, 5434, 5843, 6207, and 6426. We 
then take the Archipelago AS topology [32] of June 2009, available at [33j, and for each t = 0, 1, . . . , 10 we remove 
from it all ASs and their adjacent links that are not in Lt, thus obtaining a time series of historical AS topology 
snapshots St- We then map each St to the hyperbolic space as described in Section |llj and for each t — 0, 1, . . . , 9 
and f — t + 1 we compute the empirical probability p{x) of connections between new and old ASs as a function of 
hyperbolic distance x between the ASs. To compute p{x), we linearly bin distance x, and show in each bin the ratio 
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FIG. SI: Plots (a), (b), (c) show the probability n(fc) that an old node attracts a new link in the AS Internet, the E.coli 
metabolic network, and the PGP web of trust, respectively. The network snapshots in each case are the ones used in Fig. 3. 
The plots also show the results for the corresponding PA emulations and the theoretical prediction, which is the same linear 
function in popularity x similarity optimization and PA. 

of the number of connected ASs to the total number of AS pairs located at hyperbolic distances falling within this 
bin. To avoid clutter. Fig. 3(a) shows the results for the first and last pairs of consecutive snapshots, i.e., So,Si and 
SqjSiq. The number of new ASs in each pair is respectively 806 and 259. Similar results hold for all intermediate 
snapshot pairs. The figure also shows the results of PA emulations. To emulate PA in a snapshot pair, the links 
adjacent to a new AS are first disconnected from the old ASs to which these links are connected in reality, and then 
reconnected to old ASs chosen randomly with the normalized probability ^ fc + k('-f — 2)/2, where k is the number 
of connections the AS has to other old ASs, and k — 5.3, 7 — 2.1, taken from the Internet. The average clustering 
in the Internet is c = 0.61, and both k and c are stable across the considered period. Figure Sl(a) validates effective 
preferential attachment for the pairs of Internet snapshots considered in Fig. 3(a). 

B. E.coli metabolic network 

We use the bipartite metabohc network representation of the E. coli metabolism from ^3] , reconstructed from data in 
the BiGG database «AF1260 version of the K12 MG1655 [37] strain. The bipartite representation differentiates 

two subsets of nodes, metabolites and reactions, mutually interconnected through unweighted and undirected links, 
without self-loops or dead end reactions. Reactions that do not involve direct chemical transformations, such as 
diffusion and exchange reactions, are avoided and isomer metabolites are differentiated. To enhance the resolution 
of the mapping procedure, currency metabolites are eliminated (h, h2o, atp, pi, adp, ppi, nad, nadh, amo, nadp, 
nadph), altogether with a few isolated reaction-metabolite pairs and reaction- metabolite-reaction triplets. This leads 
to a globally connected set of 1512 reactions and 1010 metabolites. Starting from this bipartite network, we construct 
its one mode projection over the space of metabolites, that is, we consider only metabolites and declare two metabolites 
as connected if they participate in the same reaction in the original bipartite network. The resulting unipartite network 
of metabolites has a power law degree distribution with exponent 7 — 2.5, average degree k = 6.5, and the average 
clustering is c = 0.48. 

Empirical data for ancestral metabolic networks is not available. However, it has been argued that there exists a 
direct relation between the evolutionary history of metabolism and the connectivity of metabolites. The hypothesis 
is that metabolic networks grew by adding new metabolites, such that the most highly connected metabolites should 
also be the phylogenetically oldest |i55ti3T] . Following this idea, we sorted the network of metabolites by degree to 
construct an ancestor core metabolic network of 460 metabolites with degrees larger than 4, and two shells including 
metabolites of degrees 4 and 3, respectively. Each shell is meant to represent the addition of new metabolites in 
subsequent evolutionary steps. The first shell consists of 142 new metabolites and the second shell of 171 new 
metabolites. Time t = corresponds to the core network Sq- Time t = 1 corresponds to the snapshot of the topology 
Si consisting of the metabolites in 5*0 and the new metabolites in the first shell. And, time t = 2 corresponds to the 
snapshot of the topology S2 consisting of the metabolites in Si and the new metabolites in the second shell. We map 
Sq, Si, S2 to the hyperbolic space and compute the empirical connection probability, following the same procedure 
as in the previous subsection for the Internet. As before, we also perform PA emulations. The results are shown in 
Fig. 3(b) and in Fig. Sl(b), and are very similar to those of Figs. 3(a), Sl(a). The data from this section are also 
used in Fig. S12 of Section [Ixl 
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C. PGP web of trust 



Pretty-Good-Privacy (PGP) is a data encryption and decryption computer program that provides cryptographic 
privacy and authentication for data communication [42]. PGP web of trust is a directed network where nodes are 
certificates consisting of pubhc PGP keys and owner information. A directed hnk in the web of trust pointing from 
certificate A to certificate B represents a digital signature by owner of A endorsing the owner/pubUc key association 
of B. We use temporal PGP web of trust data collected and maintained by Jorgen Cederlof [43]. 

The PGP web of trust (WoT) data is analyzed as follows. We consider two closely spaced in time pairs of WoT 
snapshots taken in April 2003, October 2003, and December 2005, December 2006. For each of the directed graphs we 
form their undirected counterparts by taking into account only bi-directional trust links between the certificates. For 
each of the undirected counterparts we isolate its largest connected component. Then, for each pair of the snapshots we 
identify old and new sets of nodes. As before, the set of old nodes contains all nodes present in both snapshots, while 
the set of new nodes contains nodes that are present in the newer snapshot and not in the older snapshot. We refer to 
the obtained undirected connected subgraphs of the WoT, as snapshots Sq, Si, S2, S3, for April 2003, October 2003, 
December 2005, December_2006, respectively. The numbers of nodes in So, Si, S2, S3 are 14367,17155,23797,26701, 
while the average degree k is 5.3,6.2,7.9,8.1 and the average clustering is c = 0.47-0.48. The degree distribution 
can be roughly approximated by a power-law with exponent 7 — 2.1, yet we observe some deviations from this 
power at high degrees, see Fig. S13(a). We map Sq, Si, S2, S3 to the hyperbolic space and compute the empirical 
connection probability, following the same procedure as in the previous two subsections. As before, we again perform 
PA emulations. The results are shown in Fig. 3(c) and in Fig. Sl(c), and are very similar to Figs. 3(a), 3(b) and 



Figs. Sl(a), Sl(b). The data from this section are also used in Fig. S13 of Section IX 



By using the PGP data as described, we strengthen the social component of the WoT, since we only consider bi- 
directional signatures, i.e., pairs of users (owners of PGP keys) who have reciprocally signed each other's keys. This 
filtering process increases the probability that the connected users know each other, and makes the extracted network 
a reliable proxy to the underlying social network. We consider the PGP WoT since it is a massive evolving unipartite 
graph, which represents real social relationships of trust among individuals, and for which complete historical data is 
available. 



II. INFERRING THE POPULARITY AND SIMILARITY COORDINATES 



Here we describe the network mapping method used to infer the popularity and similarity coordinates in the 
considered real networks. 

Given a snapshot of a real network consisting of t nodes, we use the Markov Chain Monte Carlo (MCMC) method 
described in [44' to compute the current radial (popularity) (t) and angular (similarity) 9s coordinates for each node 
s in the network. In this section we briefly describe this method. See [44j for further details. 

To infer the radial coordinates is relatively easy. In Section [TV] we derive the exact relation between the expected 
current degree ksit) of node s and its current radial coordinate rs{t) in the model, ks{t) ~ g'"t-''s(*)^ where rt is the 
current radius of the hyperbolic disc. Therefore to infer the radial coordinates in a real network, we use the same 
expression substituting in it the real degrees ks{t) of nodes instead of their expected degrees. 

The inference of the angular coordinates is much more involved. In summary, we first measure the average degree, 
power-law exponent, and average clustering in the network to determine m, /3, and T, and then execute the Metropolis- 
Hastings algorithm trying to find the angular coordinates that would maximize the probability (or likelihood) 

c = Hpix^.^rn^-pi^v)]'-"''. (1) 

i<j 

P{Xij) = Itt377 , (2) 

1 + e T 

that a given real network with adjacency matrix a^-, i,j = 1,2, ... ,t, and with given node coordinates defining the 
hyperbolic distances Xij between nodes, is produced by the model with the measured parameters. The algorithm 
operates by repeating the following steps: 

1. Compute the current likelihood Cc', 

2. Select a random node; 



3. Move it to a new random angular location; 
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4. Compute the new likelihood £„; 

5. If Cn > Cc, accept the move; 

6. Otherwise, accept the move with probability Cn/^c] 

and some manual intervention and guidance are needed for this algorithm to actually succeed in a reasonable amount 
of computing time (44j. 

Given a historical series of real network topology snapshots 5*0, S*!, >S'2, we first map Sq to the hyperbolic space 
exactly as just described, i.e., we compute the current radial coordinate, and angular position of each node. In the 
previous section, we have considered a series of 11 AS Internet snapshots Sq, Si, Sio, a series of 3 E.coli metabolic 
network snapshots 5*0, ^i, 5*2 and two series of two PGP web of trust snapshots 5*0, Si and S2, S3. For new nodes in 
consecutive snapshots of a series we compute their hyperbolic coordinates keeping the coordinates of old nodes fixed. 
That is, once a node appears at some time and gets its coordinates computed, its coordinates never change. Although 
according to the model the radial coordinates of nodes should increase with time (unless 7 = 2), here, for simplicity, 

we keep them fixed. This simplification is justified because the difference Ar = (7^) ^^T^ radial coordinate 

of every node in snapshots with t + At and t nodes is not significant in the closely spaced snapshot series that we 
consider. In particular, the maximum value of Ar in the Internet, metabolic and PGP snapshot series is respectively 
0.058, 0.35, 0.032. Another simplification is that since the old node coordinates are fixed, we compute the angular 
coordinate for new nodes i using only their local contributions to the total likelihood in Equation ([ij, i.e., instead 
of (|l[ we use A = Uj^,pix^j)''^'[l - . 

Figure 3 shows that the empirical connection probabilities between new and old nodes in the Internet, E.coli 
metabolic network, and PGP web of trust, follow their theoretical predictions. These results signify that new connec- 
tions in these networks are established as our popularity x similarity optimization framework predicts. 

III. DISCUSSION OF THE MAPPING METHOD 

Here we show that the mapping method yields meaningful results, without overfitting or other artifacts. 

The number of parameters in the model is large. It is proportional to the network size, since we have to infer 
coordinates for each node. Therefore a natural question that arises is whether the mapping method described above 
yields meaningful results. In particular, could it be the case that the good match between empirical and theoretical 
connection probabilities in Fig. 3 is due to overfitting? 

In this section we show that the inference results are indeed meaningful, since we find strong correlations between 
inferred coordinates and network-specific node attributes in each considered network. We also compute the logarithmic 
loss, which is the metric of the inference quality for statistical inference methods based on maximum-likelihood 
estimation. We show that this quality is good for each considered network, confirming that the results in Fig. 3 are 
not an (overfitting) artifact. Finally, we provide an example of real network (IMDb), where this quality is poor, and 
so is the logarithmic loss. Collectively, these results show that the inference method does not suffer from overfitting. 
In particular, if it were the case, then this method would yield statistically good results for any network. 

A. The mapping yields meaningful results 

1 . Internet 

In [44] , where we study Internet routing, we use the method described in Section |ll] to map the Archipelago AS 
topology of June 2009, used in Section [TX} The mapping yields meaningful results, since ASs belonging to the same 
country are mapped close to each other, see Figs. 3 and 5 in [M]. More precisely, one can see from Figs. 3,5 in [44J that 
for the majority of countries, their ASs are localized in narrow angular regions. That is, even though the mapping 
method is completely geography-agnostic, it discovers meaningful groups or communities of ASs belonging to the 
same country. 

The reason for this effect is that ASs belonging to the same country are usually connected more densely to each 
other than to the rest of the world, and the method correctly places all such ASs in narrow regions close to each other. 
We can also see from Fig. 3 in [44] that in many cases, geographically or politically close countries are located close 
to each other on the circle. These results prove that the angular coordinates inferred by the method reflect reality 
well, as is the case with the other two networks that we consider here. 
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angular coordinate, 9 angular coordinate, 9 

FIG. S15: The mapping of the PGP web of trust yields meaningful results since PGP certificates belonging to the same country 
code are mapped close to each other (plot (a)). Similar results hold for other country codes. By contrast, certificates belonging 
to the generic top-level domain .net are widespread (plot (b)), as expected. In [441 we show that the method also yields 
meaningful results for the Internet, and in [34j for the E.coli metabolic network. 



2. E.coli metabolic network 



Distances in metabolic network maps give a measure of the chemical potential of metabolites to participate jointly 
in reactions, such that higher reaction Ukelihoods are naturally associated to metabolites which are closer in the 
underlying space. It is then expected that metabolites participating in reactions in the same biochemical pathway 
would cluster in specific regions in the inferred space. 

This was indeed observed in [34] for the cartographic network representation of the metabolism of E.coli, see 
Figs. 2, 3 and 4 there. The geometric embedding of the metabolic network in Fig. 2 in |34|j obtained by the same 
mapping method we use in this paper, shows that metabolites that participate jointly in reactions are mapped close 
to each other, i.e., in the same angular regions. In particular, pathways — classically understood as chains of step- 
by-step reactions which transform a principal chemical into another — are in general strongly localized, even though 
some adopt either a discrete bi-modal or a multi-peaked form, and only a very small fraction transversally spread 
over the circle (Fig. 3 in [H]). Furthermore, pathways in related functional categories tend to concentrate into well 
defined sectors (Fig. 4 in [34]). Therefore our model discriminates well the concentrated pathways, most frequent 
and consistent with the classical view of modular subsystems, from others, formed of subunits, and even from those 
responsible of producing or consuming metabolites used extensively in many other pathways. 



3. PGP web of trust 



To see if the mapping method yields meaningful results in the case of the PGP web of trust (WoT), we consider 
its mapped topology of April 2003 from Section I C For each node (PGP certificate) in the topology, the data we 
use [13 contain the email address of the corresponding owner of the PGP certificate. For each email address we can 
identify the top-level domain that the email address belongs to, which is the last part of the email address. Examples 
of top-level domains are .com, .net, .org, .de, .fr., .it and other country codes. Therefore, for each node in the PGP 
network we can identify the top-level domain that the node belongs to. If the inferred angular coordinates reflect 
reality well, then we expect PGP nodes belonging to the same country code to be mapped to angular locations close 
to each other, since in general people in the WoT are expected to trust other people from their own country more. 
By contrast, we do not expect this to be the case for generic top-level domains such as .net. In Fig. S15 we show that 
the mapping method indeed yields meaningful results, as expected. 



B. Overfitting considerations and logarithmic loss 

1. The number of parameters versus the number of predictions 



In general, a statistical inference method may suffer from overfitting if the number of parameters in the model is 
comparable or larger than the number of predicted parameters. Here we show that for any reasonably sized network. 
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the former is much smaller than the latter in our model. 

Indeed, given a snapshot of a real network consisting of t nodes, the mapping method in Section [ll] finds the 
angular coordinate of every node in the network such that the likelihood that the network is produced by the model 
is maximized. That is, if there are t nodes in the network the method infers t parameters (angular coordinates). 

However, we stress that if a network consists of t nodes, then there are 0(t^) node pairs in the network, and for 
each node pair i,j < t, the model predicts an independent probability of the existence of a link between this node 
pair p{xij). If the mapping is successful, then every node pair i,j < t is placed at the right hyperbolic distance. In 
other words, if the fitting is successful, then with just t parameters, the model manages to successfully make 0{t^) 
predictions. 

If we consider new nodes in a subsequent snapshot, the method infers their angular coordinates such that they are 
all placed at the right hyperbolic distances with respect to the old nodes. If there are Ai new nodes and t existing 
nodes, and the mapping is successful, then with just At parameters, the model makes O(tAi) predictions. 

If a real network is well described by the model, then the fitting of the large number of unknowns with a significantly 
smaller number of parameters is expected to be successful, as Fig. 3 illustrates. However, to compute the empirical 
connection probability in Fig. 3, we have to bin the hyperbolic distances into a small number of bins to have statistically 
reliable results for ratios of the number of connected node pairs to the total number of node pairs at distances within 
each bin. Instead of a large ensemble of graphs generated with the same parameters, we have only one real network, 
and we do not have any other method to compute the empirical connection probability for it. Therefore it is desirable 
to assess the mapping quality using an appropriate metric independent of any binning. Such metric for maximum- 
likelihood inference methods is logarithmic loss. 



2. Logarithmic loss 
In general, the logarithmic loss [IS] is defined as 

L = -\ogC, (3) 

where C is likelihood. Since maximum-likelihood inference methods operate by maximizing likelihood, logarithmic 
loss is a natural metric of the quality of the results that these methods produce. Specifically, if the results are good, 
then logarithmic loss is small. To estimate how small is "small" here, one usually compares against the case with 
random parameter assignments. 

In our case, the likelihood C is defined in Equation ([T]). That is, for a given real network and a given set of inferred 
coordinates, the logarithmic loss is 

L = - ^ [a,j log [pix.j)] + (1 - a,j) log [1 - p{x,j)]] , (4) 

where the above sum goes over all 0{t^) pairs of nodes where t is the network size. That is, we stress that 
logarithmic loss depends on all the O(t^) predicted probabilities p{xij). The logarithmic loss is nothing but the 
absolute value of the logarithm of the probability that the network is generated by the model, given the set of inferred 
node coordinates. 

We compute logarithmic losses for the Internet, E.coli metabolic network, and the PGP web of trust, with the node 
coordinates inferred by our mapping method. We contrast these logarithmic losses against those obtained for the 
same networks with random angular coordinates. That is, we first assign to each node an angular coordinate drawn 
uniformly at random from [0, 27r]. The randomized logarithmic loss is then 

Lrand = " [^'(^u)] + (1 ^ 0.ij) log [1 - p(iij)]] , (5) 

where Xij is the hyperbolic distance between nodes i and j with random angular coordinates. The smaller the L 
compared to Lrand, the better the quality of the mapping, i.e., the better our model describes a given real network. 

To test the robustness of the inferred coordinates we also calculate logarithmic losses after distorting inferred angular 
coordinate 9i to 

e^ = e, + Se, (6) 

where S = 0.05, 0.1 radians and e a random variable drawn uniformly from the interval [—1, 1]. 
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Network Name 


L 


L, 5 = 0.05 


L, S = 0.1 


^rand 


1 errand 


Internet (April 2007) 


1.4 X 10^ 


1.8 X 10^ 


2.3 X 10^ 


2.7 X 10^ 


exp(1.3 X 10^) 


E.coli metabolic (Sq) 


7.3 X 10^ 


9.0 X 10^ 


9.6 X 10^ 


1.4 X 10* 


exp(6.7 X 10^) 


PGP web of trust (April 2003) 


6.9 X 10" 


1.7 X 10^5 


2.4 X 10^ 


3.0 X 10^ 


exp(2.3 X 10^) 



TABLE I: Logarithmic losses calculated with the inferred coordinates, distorted angular coordinates with 5 = 0.05,0.1 radians, 
and fully randomized angular coordinates, for the Internet (April 2007 snapshot), E.coli metabolic network [So snapshot), and 
PGP web of trust (April 2003 snapshot). The last column shows the ratios of likelihoods Lj Lrand ~ exp(Lrand ^ L), which 
are the ratios of the probability that the network is produced by the model with the inferred angular coordinates, to the same 
probability with all these coordinates being random. 



Network Name 


L 


^rand 


ZZ/ errand 


Internet (Jan-Apr 2007) 


1100 


1800 


exp(700) 


E.coli metabolic {SoSi) 


400 


600 


exp(200) 


PGP web of trust (Apr-Oct 2003) 


5900 


9000 


exp(3100) 



TABLE II: Logarithmic loss calculated only for new-old pairs of nodes. 



The logarithmic loss values are reported in Table |T] From the table we observe that the logarithmic losses calculated 
using the inferred angular coordinates are significantly smaller than those with random angular coordinates, indicating 
that the considered real networks are well described by our model, corroborating the results in Fig. 3. 

We also compute logarithmic losses considering only the links between new and old nodes. That is, given two 
consecutive snapshots of a network St and S't+i, we define the logarithmic loss as 

= - H ["'J + (1 ~ «y) log [1 - Pi^^j)]] > (7) 

where summation is now over only 0{t/S.t) new-old node pairs, and where t is the number of old nodes, and At 
the number of new nodes. Again, the logarithmic losses using the inferred angular coordinates of new nodes are 
significantly smaller than those obtained using randomized angular coordinates, see Table signifying that new 
connections in these networks are well described by the popularity x similarity optimization. 

C. Example of a network that is not well described by the model 

We finally present an example of a real network for which our mapping method does not produce good results. 
Specifically, we consider the actor network from the Internet Movie Database (IMDb) [35]. To build the network 
we connect two actors if they have co-starred in at least one film, limiting our consideration only to films labeled as 
comedies. The largest connected subgraph of the resulting network in the year of 2000 consists of 44936 actors and 
has average degree k = 13.6, and average clustering c = 0.55. This actor network is another example of a growing 
network with strong clustering and heterogeneous node degrees. However, the mapping of this network using our 
method is poor, as illustrated by the connection probability in Fig. S16. 

We also compute the logarithmic loss for this network using the inferred node coordinates. The result is L = 6. Ox 10^. 
This value is larger than the one obtained after randomizing the node angular coordinates, Lj-and = 3.8 x 10^, so that 

C/Crand = exp(Lrand ~ L) = CXp(-2.2 X 10'^). 

The reason why our model does not describe the actor network well is the following. By construction, the network 
is overinflated with fully connected subgraphs, since many modern film crews include hundreds of dissimilar actors. 
Any pair of such actors participating at least once in such a large-scale film project, are connected, leading to an 
abundance of large cliques in the network. As a result of this overinflation, even not so famous actors that may join 
the project coming from many different countries, have high chances to be connected. That is, connections in this 
network are not well described by popularity x similarity optimization, because even fairly dissimilar and unpopular 
actors may be connected with high probability. Therefore, the fact that this network cannot be successfully mapped 
by our method is quite expected. 
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FIG. S16: Connection probability for the actor network considered in Section [ill C| 



IV. THE POPULARITY X SIMILARITY MODEL: ANALYSIS AND SIMULATIONS 



In this section we discuss the formulation details of the popularity x similarity model, analyze its properties, and 
verify them in simulations. We start with the simplest version of the model: (1) initially the network is empty; (2) at 
time t>l, new node t appears having coordinates (rj, 9t), where rt — Int, while 0t is uniformly distributed on [0, 2tt], 
and every existing node s, s < t, moves increasing its radial coordinate according to rs(t) = Pr^ + (1 — j3)rt with 
parameter /3 g [0, 1]; and (3) node t connects to the m hyperbolically closest nodes s, s < t; at early times t < m, 
node t connects to all the existing nodes. The value of m controls the average degree in the network k = 2m. The 
hyperbolic distance between two points (rs,9s) and {rt,9t) is given by |25j 

x., = i arccosh (cosh 2r. cosh 2n-sinh2r.sinh2r, cos 0.0 
« r, +rt+ ln{9st/2), where 9st ^ n - \tt - \9s - 9t\\. 

This expression gives the distance between two points on the hyperbolic plane of curvature K = —4 [2 5) . The model 
can be generalized for any curvature value (see Section VI I without affecting the results since changing the value of 



curvature corresponds to simple reseating of all distances, thus preserving the distance- induced ordering of nodes, e.g., 
the sets of m closest nodes, etc. We call the above model Modeli. 

We show in Section |IVB| that clustering is strongest possible in the networks generated by Modeli . To weaken 
clustering we allow connections to nodes farther apart. To do so, we modify step (3) of Modeli as follows: (3) new 
node t picks a randomly chosen node s, s < t, and given that it is not already connected to it, it connects to it with 
probability p{xst) = + 6^'^=*"^*)/'^], where parameter T is called network temperature, and Rt ~ n — the exact 
value of Rt is specified below. Node t repeats this step until it gets connected to m nodes. The connection probability 
p{xst) is nothing but the Fermi-Dirac distribution [37]. We call this model Model2. 

We also show in Section IV Bj that clustering is a decreasing function of temperature, and that at zero temperature 
we recover the strongest clustering case, where new nodes connect to the hyperbolically closest existing nodes. But 
first we show that for any /3 G (0, 1) both models produce scale-free networks with the power-law degree distribution 
identical to the degree distribution in networks growing according to preferential attachment (PA) [4,, and having 
power-law exponent 7 = 1 + ^• 



A. Degree distribution 

We start with Modeli. Consider new node t, let Rt be the radius of a hyperbolic disc centered at this node, and 
let it connect to all nodes s, s < t, that lie within this disc. The probability that there is a connection to node s is 



P [Xst <Rt]=P 



The average number of existing nodes lying within Rt is 



N{Rt)^ I Pixu<Rt)dt = -e-^'-'-''^^ I e-'-'(*)dz= -e-('-'-«*)^ fl-e-(i-«'^') 

J I TT Jl TT l-^V / 



(8) 



(9) 
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Therefore 



Rt =rt- In 



2 (1 - e-Ci-^"^*) 
^ iV(i?t)(l-/3y 



(10) 



is the radius of the hyperbohc disc centered at node t, which contains on average the closest N{Rt) existing nodes. 
Setting N{Rt) = m and substituting Rt from Equation (10 1 into Equation (|8|), we find the probability that an existing 



node that appeared at time s attracts a link from a new node t, if node t connects on average to the m closest existing 
nodes 



n{r,{t))^ P{xst<Rt) = 



(l_e-(i-«'-*) 



(11) 



1-13 



The above equation also holds if the new node t always con nects to exactly m closest nodes. Further, since 



1-/3 



1 



I , we can rewrite Equation (jllj) as 



n(r,(t)) =m 



-rs{t) 



-{fir, + {l-l3)rt) 



SI 



nModoli(s,i)- 



(12) 



We now recall how connections are made in PA [T, where at sufficiently large times t, an existing node s with 
degree ks{t) attracts a link from a new node t with probability 



ks{t) -m + A 
(m + A)t ' 



(13) 



where m is the number of existing nodes that each new node connects to, A — — 2)m is a parameter called initial 
attractiveness of each node, and 7 is the exponent of the target power law degree distribution. Notice that since each 
new node brings m connections, at large times t the denominator in Equation ( 13 1 can be written as 



{m + A)t= / {ki{t) - m + A)di. 



Further, it has been shown [4] that 



ks{t) = 



A 



- 1 



(14) 



(15) 



where /3=;^,/3e (0,1). 

The connection probability given by Equation (131 is conditioned on the exact value of the degree of the node s, 
ks{t). Therefore, the unconditional probability that an existing node s attracts a link from a new node t, which can 
be obtained by Equation (13) after replacing ks{t) with its expected value, is 



ks{t) 



A 



u(kJt)) = TO , — — ^ TO „ 



di 



npA(s,i). 



From Equations ( 12 1 and ( 16 ) we conclude that 

IlModoli (s, t) = npA(s, t). 



(16) 



(17) 



This means that for fixed m and /3 = the probability that an existing node s, s < t, attracts a link from a 
new node t, is the same in Modeli and PA. This, in turn, means that the resulting degree distribution in Modeli is 
identical to PA, i.e., it is the same power law with exponent 7 = 1 + -^, whose exact expression is given by [1] 



P(fc) = (7 - 1) 



r[(TO + 1)(7 - 2) + l]r[fc + to(7 - 3)] 
r[m(7 - 2)]r[fc + to(7 - 3) + 7] 



(18) 



Further, knowing the current degree of a node fc, the node attracts a link from a new node t with probability as in 
Equation ( 13 ) 



(19) 



14 



Probabilities P{k) and n(fc) are both defined for k > m. Finally, using Equation (15), we can deduce that 

„-(r,(i)-'-t) 



ks{t) = m + A 



1 



(20) 



In contrast to PA where the case 7 = 2 is problematic [4], there are no problems with 7 = 2 in Modeli, where 
7 = 2 corresponds to /3 — 1, i.e., to the case where nodes do not move. It is easy to check that for /3 — >■ 1, 
/j* e-'''(*)di = ^ (1 - e-fi-'')'-*) ^ rj, and Equations (B, (10), and (11) are all well defined. 

We now move to Model2 , and show that the same results wlui respecTto the degree distribution hold there as well. 
Recall that in Model2 a new node t, instead of connecting to the m closest nodes, picks a random existing node s, 
s < t, and given that it is not already connected to it, it connects to it with probability p{xst) = 1/[1 + e^^=*~^*^/"^]. It 
then repeats this procedure until it gets connected to m nodes. Notice that at long times m, the probability that 
node t selects a random node s to which it is already connected, is insignificant and can be ignored to ease analysis. 
Further, notice that the probability p{xst) can be also written as 



where X[s, t) 



(21) 



Since node t picks a random existing node and 9st is uniformly distributed in [0,7r], the probability that node t 
connects to node s is 



P{s,t) 



1 1 



1 



i + (x(s,i)^)- 



-de.. 



2T 



1 



tsiiiTiT X{s,t)' 



(22) 



The approximation in Equation (22) holds for T < 1. Now, the probability that node t connects to any node is 

ft 



P{i,t)di. 



(23) 



Since node t brings m new links, then at sufRciently large times t, the probability that node s attracts a link is 



n 



Modol2 



is,t) 



P(s,t) 



nModoii(s,i) = npA(s,i). 



(24) 



Pit) Jle-^Wdi 

This means that for fixed m and P — the degree distribution and link attraction probability in Model2 are the 
same as in Modeli, i.e., given by Equations (18) and (19). The limit /? -> 1 is also well defined. 

Notice that as T — ^ 0, p{xst) —J' 1 if Xst < Rt, and p(xst) — > if Xst > Rt- In this case, setting Rt as in Equation 



(10) with N{Rt) — m, constrains the connections of a new node t to its m hyperbolically closest nodes, and Model2 
becomes identical to Modeli. In Model2, we can also compute the average number of existing nodes lying within Rt 
from a new node t 



N{Rt) = tP{t) = - 



2T 



sin Tn 



Therefore, in analogy to Modeli, setting N{Rt) = m we can fix Rt 



Rt ^ rt - In 



2T 1 



sinTTT m(l — /3) 



(25) 



(26) 



Equation (26) is valid for < T < 1, and for T — >■ it becomes Equation ( |10[ ) as expected. 

Figure S2 shows simulation results for Model2, and Fig. 2(a) with Fig. S3(a) show simulation results for Modeli, 
validating our analysis. Figure S3(b) also shows that clustering is strong in networks growing according to 
popularity X similarity optimization, as opposed to PA. We study clustering in the next section. 



B. Clustering 



We have shown that networks grown according to popularity x similarity optimization have an effective hyperbolic 
geometry underneath, from which power-law degree distributions emerge. We now show that the metric property 
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FIG. S2: Plot (a) shows the probability n(fe) that an existing node of degree k attracts a link in networks grown up to t = 10* 
nodes according to Model2, with T = 0.5, m = 3, and 7 = 2.1, 3.0. The plot also shows the corresponding theoretical predictions 
given by Equation ( |19[ ). Plot (b) shows the distribution of node degrees in the same networks. The theoretical predictions 
are given by Equation 1 18 1. Small deviations of the theoretical prediction for 7 = 2.1 are due to the increasingly pronounced 
finite-size effects at 7 — >■ 2 |3S]. Similar results hold for other values of7>2,0<r<l, and m, not shown to avoid clutter. 



of this geometry, i.e., the triangle inequality, leads to strong clustering in these networks, i.e., the large number of 
triangular subgraphs. 

Intuitively, if node a is hyperbolically close to a node 5, and h is close to a third node c, then a is also close to c 
because of the triangle inequality. Since all three nodes are close to each other, links between all of them forming 
triangle ahc exist with high probability. This probability depends on the value of the temperature T G [0, 1). 

We show next that average clustering at time i, c(t), is a decreasing function of temperature: clustering is maximized 
at r = 0, and it gradually decreases to zero as T — 1. 



Analysis 



Let c{s,t) be the average clustering of node s at time t. Then 



1 



c(t) = - c(s, t)ds. 
t J I 



(27) 



where c(s,t) is given by [50] 



c(s,t) 



(28) 



and Ts{t) is t he e xpected number of triangles that contain node s at time i, while ks{t) is s'es expected degree given 



by Equation (15). To compute Tgit) we break it into two parts: (i) T'°''^, which is the expected number of triangles 



formed when node s appeared, i.e., by connections from node s to existing pairs of connected nodes; and (ii) T^™{t), 



16 




FIG. S3: Plot (a) shows the distribution P{k) of node degrees for the two networks considered in Fig. 2. Plot (b) shows for 
the same two networks the average clustering c(fc) of A;-degree nodes, defined as the ratio of the number of triangles involving 
a fc-degree node to the maximum such number k{k — l)/2, averaged over all the fc-degree nodes. The 1/k scaling of c{k) is 
often considered as a signature of the network's hierarchical organization 49 . The average clustering c = J]]^ c{k)P{k) in the 
optimization and PA networks is c = 0.83 and c = 0.12, respectively, as mentioned in Fig. 2. 



which is the expected number of triangles formed by new nodes appearing after node s, i.e., by connections from new 
nodes to old pairs of connected nodes where one of the nodes is node s. Clearly, Ts{t) = T°^^ + T^™{t). 

The probability that two nodes s < t are connected in Model2 given the hyperbolic dista nce Xst between them. 



is m-*^f^y^ = m = p{xst), i.e., they connect with probability given by Equation (21 1. Introducing notation 

Xst = X{s,t)^, we can write 

p{xst) = ^ = PiXst)- (29) 



1 + X, 



St 



Since T < 1, the function p{x) is integrable 



1=1 -^dx^^. (30) 



Further, since X{s,t) = 6^''=^*^+''' ^■<^\ with Rt given by Equation (261 we can also write 

2T 1 ~ p-(i-/3)n ^1-/3 _ 1 

X{s,t)^—^f{s,t), where /(5,t) = 7— ^e--(^) = (31) 

smiTT m(l — p) m[l — p)s P 

Now, the probability that three nodes s, t', t" < t form a triangle, is the probability that the three nodes are connected. 
Let Of, Of be the angular coordinates of nodes t' and t" respectively, and 9 s be the angular coordinate of node s. 
As the angular coordinate is uniformly distributed, we can set without loss of generality 6s — 0. Therefore, with 
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9f//, and 9t't" = Oft' = St' ~ 6t", it is easy to see that 
1 



det.det->pi\xt'A)p{\xt"t'm\xt"s\). 



.91 dt' dt" 

dt'Udt"! I det.d9t»p{\xst'm\xt"t'm\xt"s\) 



1 

ct' 



dt" 



det'rf0t"P(lxst'IMIxt"t'IMIx.t"l) 



(32) 



— 77 J —IT 



Changing the 9 integration variables in Equation ( 32 1 to the corresponding x variables gives 



1 



dt' 



dt" 



{2IY J, f{t',s)J, f{t",s) 

,If{t\s) rlf{t"..s) 

X / dx' dx"p{\x'\)p{f{t",t') 

J-If{t',s) J-If{t",s) 



X' 



X" 



fit',s) fit",s) 
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(33) 



which cannot be written as a closed-form expression. However, these equations allow us to infer the relationship 
between the clustering strength of the network and parameter T. As T — )■ 1, / — > oo, and therefore, T"^"^ — ^ 0, 
T^'^'^it) — 0, Vs,i, meaning that clustering goes to zero. As T ^ 0, / ^ 1, and clustering is maximized. To see this, 
consider the node with the smallest degree, i.e., the node that appeared at time s = t, whose degree is kt{t) = m. 
Clearly, T"™(t) — 0. To compute T°^^, observe that when T — ?> 0, p{x) ©(1 ~ x)j ^^'^ therefore, the inner integrals 
taken over the variables x', x", reduce to the area of intersection of the square defined by {|x'| < 1; |x"| < 1}, and the 



stripe f{t",t') 



X 

fit'.t) 



X 

f{t",t) 



< 1. For most of the combinations of t',t" the stripe is so wide that it fully contains 



old 



Given Equation (28), this means that c(t,t) « 1, proving 



the square whose area is 4, yielding at large t, T[ 
that clustering is maximized at the zero temperature. Recall that clustering cannot be equal to its maximum possible 
value of 1 for all node degrees because of structural constraints imposed by power-law degree distributions [51]. For 
arbitrary values of s < i we need to compute Tf°'"{t), but the inner integration region defined by the x', x" variables 



in the expression for T'^^°'^{t) (33) depends on the exact mutual relationship between s, t' , and t" , making the analytic 
computation unfeasible. However, one can check that c(s, t) increases as s increases, and that average clustering 
decreases almost linearly with T G (0, 1). 



2. Simulations 



Figure S4 shows average clustering in simulated networks. As predicted by our analysis, clustering decreases as T 
increases, and vanishes as T approaches 1. Clustering is also the stronger, the smaller the 7. 

To confirm that zero temperature yields the strongest possible clustering (modulo fluctuations), we perform the 
following experiment. We grow three networks up to t = 1000 nodes according to Modeli with 7 = 2.1, 2.5, 3.0 and 
TO = 3. The average clustering in these networks is c = 0.83, 0.76, 0.72, respectively. For each network we then 
perform a number of random link rewirings preserving the degree distribution in the network and trying to increase 
its clustering if possible [S2] • Specifically, we select a random pair of links A-B and C-D in the network, and rewire 
them to A-D and B-C, provided that none of these links already exist in the network and that the rewiring will not 
decrease clustering. If these two conditions are met, then the rewiring is accepted, otherwise it is aborted, and a 
new pair of links is selected. This way each accepted rewiring step preserves the degree distribution in the network, 
and can only increase its average clustering. For each network we run the experiment until 2000 rewiring steps were 
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FIG. S4: Average clustering c{t) a.t t = 10"* 
popularity X similarity optimization with m — 3. 



as a function of temperature T G [0, 1) in networks grown according to 
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FIG. S5: Average clustering as a function of the number of accepted clustering-increasing rewirings in networks grown according 
to Modeli with m — 3. 



accepted, measuring the new average clustering Cnew every 100 accepted rewirings. Figure S5 shows the results. 
From the figure, we observe only a minor increase of clustering from its original value, quickly reaching saturation 
as the number of accepted rewirings increases, as expected. After 2000 accepted rewirings the average clustering is 
Cnew = 0.86, 0.81, 0.78 for 7 = 2.1, 2.5, 3.0. 



C. Connecting to nodes within distance Rt, and densification 



We now consider a variant of the popularity x similarity model, where a new node t, instead of connecting to exactly 
m existing nodes as in Model2, looks at every existing node s, s < t, only once and connects to it with probability 
p{xst) given by Equation (21 1. We call this variant Model2'. In this case, the probability that node s attracts a link 
from node t is IlModeij/ (^i t) — tP{s,t), where P{s,t) as given by Equation (22). The average number of nodes that 

J^'nModcU,iht)di = t J*P{i,t)di = tP{t), with P{t) given by Equation (23 1. That 

Further, since 



node t connects to is N{Rt) 



is, N{Rt) is given again by Equation (25) and can be fixed to m by setting Rt as in Equation (26) 

N{Rt) _ _m_ 
P(t) - P{t 



t 



we have 



n 



Model 



P{s,t) 

' Pit) 



(34) 



That is, Model2' is equivalent to Model2 (cf. Eq. (24)) with the difference that in Model2' a new node t connects on 
average to m existing nodes. 

Parameter T G [0, 1) can be used again to tune clustering. As T — )■ a new node t connects only to all nodes 
within distance Rt from it, and we have a variant of Modeli where clustering is maximized. Indeed, in this case, the 
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FIG. S6: Plot (a) shows the probabihty n(fc) that an existing node of degree k attracts a hnk in networks grown up to t = 10* 
nodes according to Model2', with T = 0.5 and t — 2.1, 3.0. Each new node connects on average to m = 3 existing nodes. The 
theoretical predictions are given by Equation (19 1 when k > rn, and when k < m are given by the formula n(fc) = m ^^^^^^ ■ 
Plot (b) shows the distribution P{k) of node degrees in the same networks. The theoretical predictions are given by Equation 
(18 1, which is defined only for k > m. Compared to Fig. S2, here we observe stronger deviations of the distributions from the 
power laws at small degrees k, due to fluctuations of the initial degree of a node around its average value m = 3. Similar results 
hold for other values of 7 > 2, < T < 1, and m, not shown to avoid clutter. 



probability that node s attracts a link from a new node t is given again by Equation (|8]), vifhich means that Equations 
([9]), (10), (11) and (12) hold here as well. The difference here is that the new node t connects to closest nodes whose 
average number is m. 

To quantify the difference between Model2 and Model2', we need to consider the distribution of the num- 
ber of existing nodes that a new node t connects to in Model2', and to check how narrowly distributed this 
ntimber is around its average value m. The connection events are statistically independent, so that the num- 
ber of connections to existing nodes is a sum of independent Bernoulli trials with different success probabilities 
IlModeia/ (Sj Hencc, the distribution of N{Rt) follows the Poisson-Binomial distribution with average m and vari- 
ance cr^(t) ~ /]* (1 — IlModcij/ (i, t)) IlModcis/ {h t)di- We do not use strict equality in the formula for a^{t) as we replace 
the summation with the integration to ease the calculations. Performing the integration we can see that 

(72(t)«m-5(m,/?,t), (35) 

where g{m,/3,t) a function of m, /3, and t that goes to zero as i — ^ 00. Therefore aX t ^ 00 the variance (T^(i) 
approaches m, which is the variance of a Poisson distribution with the average equal to m. Indeed, by Le Cam's 

Theorem [S^ Si^o = ~ ^"fi — I < Ii (llModcia/ ih t)) d« — > at < — > 00, and therefore the distribution of 

N{Rt) converges to the Poisson distribution with the average at to. 

The simulation results in Fig. S6 confirm the analysis above. Figure S7 shows the simulation results for the average 
clustering as a function of temperature, where the behavior is similar to Fig. S4 as expected. Finally, in Fig. S8 we 
repeat the same experiment with the same parameter values as in Fig. S5, verifying that networks grown according 
to Model2' with T = have maximum possible clustering. 



Finally, if the connection disc radius is Rt = rt instead of Equation ( 26 ) , then the average degree is not constant 
k = 2m, but grows with the network size t, an effect known as network densification |54j . Specifically, the average 
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FIG. S7: Average clustering c{t) at t = 10* as a function of temperature T £ [0, 1) in networks growing according to Model2', 



where each new node connects on average to m ■ 
clustering is always zero. 



3 existing nodes. Clustering is calculated excluding nodes of degree 1, whose 
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FIG. S8: Average clustering as a function of the number of accepted clustering-increasing rewirings for networks grown according 
to Model2' with T = and m = 3. 



degree in this case is given by 

W) 

where 7=1 



4T 1 
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(36) 



■g is the exponent of the degree distribution as before. We see that the average degree grows 

2. More generahy, if Rt = Srt with (5 > 1, then we have 



logarithmically with the network size if 7 

W) 



4T 1 
sinTTT 1-/3 
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1 1 1 

tS ^ {I3 + S~l)t ^ {l3 + S-l)t^-^-P 



(37) 



so that for large t and 7 — 2, the average degree grows polynomially with the network size, k{t) ~ t^~-^ \nt, if <5 > 1. 
In this case the average shortest path distance and effective diameter do not increase but decrease with the network 
size, thus reproducing the shrinking diameter effect [54], see Fig. S9. 



V. CONNECTION TO THE FITNESS MODEL 



In this section we consider the popular fitness model |10| and show that it can be also mapped to our geometric 
optimization framework. 

The main motivation behind the fitness model is that in some real networks the popularity of a node does not 
depend only on its birth time, but also on its ability (fitness) to compete for links. Examples include the Web, where 
new sites may attract considerably more links than old ones, social networks where new individuals may have more 
friends, and citation networks where new research papers may acquire a large number of citations quickly. 
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FIG. S9: Densification effects. Plot (a) shows the average degree k{t) as a function of size t G [10"^, 10''] of networks grown 
according to Model2', with T = 0.5,7 = 2.1, and Rt = rt- Plot (b) shows the average shortest path distance between nodes 
as a function of size t e [10^, 10*] of networks grown as in Plot (a) but with Rt — 5rt, and 5 — 1.2. The plot also shows the 
effective diameter defined as the 90th percentile of the shortest path distance distribution [54]. 



To account for the different ability of nodes to compete for links in the fitness model [T^, the following attraction 
probability is introduced 



J* rji (jtrii (t) - m + di 



which is a variant of Equation ( |13[ ). Equation (38) says that the probabiUty that an existing node s, s < t, attracts 
a link from a new node t depends both on the node current degree /c^^ (t) and on its fitness 77^. Fitness r]s G (0, rjmax] 
is a parameter assigned to each incoming node s, which remains unchanged in time and follows some distribution 



piv) [ini- Given the fitness of each node, the attraction probability in Equation (38) is conditioned on the exact value 
of the degree of the node kjj^{t), and the unconditional probability can be obtained after replacing k,^^{t) with its 
expected value A:^^ (t) . We thus have 

Vs {krjA^) -rri + a) 

nfltnc.s(s,t) = m- ) (— . (39) 



m + A] di 



Switching to our geometric optimization framework, to account for the fact that the popularity of different nodes 
can be changing differently with time, we let nodes move with different speeds. That is, our model and its variants 
remain exactly the same, with the only difference that every existing node s, s < t, now drifts away by increasing its 
radial coordinate using the formula rg{t) = P{r]s)rg + (1 — P{r]s))rt — In . Parameter /3(?7s) is some function of the 
fitness of node s, rjs, and therefore its value can be different for different nodes. We call this variant Models. 
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Following exactly the same steps as in our earlier analysis, e.g., for Model2, we can see that 
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for N{Rt) = m. 



(41) 



Parameter T G [0, 1) can be used again to tune clustering, and the limit T — > is again well defined. 

The integral I{t) = rj, (f)'^'"'' di is in general a random variable that depends on the sequence of ?7i's, i £ (1, t), 
and on the function I3{r]). As in |I0|, we compute the expected value of I{t) 



I{t} = 



p{r])dr]di « tC for large t, 



where C — f^"^°^ I'-isiri) ^^'^ assume that I{t) « lit). We then get from Equation (40) that 
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(43) 



Using Equation (431 we compute the average degree of an existing node s at time t, given its fitness 77s 
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(44) 



Observe that Equation (44) is similar to Equation (15) with the difference that the exponent is /3(?7s) instead of /3, 
however, we again have fc^.(t) ~ e^(''-'(*)^'''). Using Equation (Gil in (Wol we can see that 

IlModcla is,t) = Ilfitncss (s, t) . (45) 

This means that for m, A^ pin) fixed, and /3(?7) — the probability that node s attracts a link from a new node t 
is the same between Models and the fitness model, which in turn means that the resulting degree distribution is the 
same. The degree distribution P(fc) is a weighted sum of different power laws, which can be computed following the 
approach in [ID] 
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(46) 



Note that the attraction probability we consider in Equation ([38| is more general than the one used in (TU] and 



degenerates to it when A = m. In this case, we see that k,f_^it) = to(| 
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We conclude this section with some additional observations. As in (TU], we conclude from Equation (44) that the 
exponent /3(?7s) is bounded, i.e., < /?(?7s) < 1 Vs, since a node always increases the number of links attached to it 
with time, fSirjs) > 0, and k^iXt) cannot increase faster than t, Pirjs) < 1. This means that rs(i) — /3i'ris)r„ 



PiVs))rt - In : 



> 0, Vs, as needed. Further, with /3ir]) 



(1- 



^ and A = i"f ~ 2)m, the value of C is computed by 



the following Equation 



l = (7-2) 



pii) 



(7-2)C 



-dt]. 



(47) 



Since fHirj) 



(7-2)C 



< 1 the singularity in the above integral is never reached and we also see that rjmax < il ~ 2)C. 



Finally, when pirj) = Sir] — fj), i.e., all fitness equal to some fj, C = -^-z^fj and /3(?y) = /3 = —^^ as expected, since 
in this case nModoi3(s,t) = nfitnGss(s, t) = npA(s,i), i.e., the degree distribution is the same, as if the network was 
growing according to standard preferential attachment with power-law degree distribution exponent 7. 
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VI. EXTENSIONS FOR ANY CURVATURE AND TEMPERATURE 



The general formula that gives the hyperbolic distance between two points (rs,Os) and {rt,Ot) for any value of 
hyperbolic space curvature K = — C^, C > is [25] 



1 2 

Xst = -arccosh(coshCrs cosh(rt — sinh^J^s sinhCft cosOgt) ^ rs + rt + - ln{dst/2). 



(48) 



The popularity X similarity model with T e [0,1) can be extended to any C < 
modifications: (i) the initial radial coordinate of each new node t > 1 is 



oo with the following two simple 
|lni (instead of Int); and (ii) 
given the hyperbolic distance Xst between new node t and existing node s, node t connects to s with probability 
p(xst) = l/[l + e«(''»'--"*)/(2'^)] (instead ofpixst) = l/[l + e(^='--"')/^]). Redoing the analysis for Models (or Models') 
it is easy to check that exactly the same results hold, and that Rt is now given by the more general formula 
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sin Tn 
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(49) 



with the limit T — > again well defined. The expected degree of node s at time t in this case is fcs(t) ^ e^^i^^i*)-^*). 
The extension for any T > 1 is a bit more involved, but we need it for the next section where we consider interesting 
high-temperature limits. The point T = 1 is a phase transition and for T > 1 the approximation in Equation (22) 
giving P{s,t) no longer holds. In particular, after performing the change of variables Xst — ^is,t)^ as in Equation 
(29), we sec that the corresponding integral (Equation (30)) diverges, and we explicitly have to cut off the integration 
at the maximum value X{s,t)^. This yields for T > 1 



P(s,t) 



T 1 

t{T-l) [Xis,t)]T 



(50) 



with X{s,t) = e^^^'^W+rt-Rt) ^ ^ > 0. In this high-temperature regime, the model has the same attraction probability 
and degree distribution as in the low-temperature regime T < 1 if the initial radial coordinate of each new node t > 1 
is rt = ^ \nt instead of rt — ^ \nt, yielding 
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(51) 



We can now allow C — > oo if at the same time T ^ C — > oo. The main difference compared to T < 1 is that clustering is 
asymptotically zero for any T > 1. We have confirmed this effect and all the expressions in this section in simulations. 

VII. CONNECTIONS TO PREFERENTIAL ATTACHMENT, GROWING RANDOM GRAPHS, AND 

GROWING RANDOM GEOMETRIC GRAPHS 



In this section we show that standard PA with asymptotically zero clustering '26', growing random graphs [55J 
and growing random geometric graphs [56j . can all be seen as limiting degenerate cases of popularity x similarity 
optimization. 

To see the connection to standard PA, we need to consider the general formula that gives the hyperbolic distance 
Xst between two points {rs,9s) and {rt,9t) for any value of hyperbolic space curvature K = — C^, C > 0, given by 
Equation (48). By letting curvature go to minus infinity, ^ oo, we transform the hyperbolic space to a tree |25) . 
and kill the 0-dependent term in the expression for Xst (48), i.e., the term abstracting the similarity distance, 
is, the hyperbolic distance between nodes depends only on their popularity, Xgt — + rt, as in PA. 
T ^ (, e.g., T = ^ without loss of generality. This setting yields rt = ^ Int = Int, and Equation (51 ) becomes 



That 
We can now set 



Rt = rt- In 



1 _ e-'^^-l^>* 
to(1 -/3) 



Further, from Equation ( 50 ) , the connection probability is now 
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(52) 



(53) 
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Distribution P(k) of node degrees in networks growing according to the standard PA limit with T — ^ 
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and 7 — 2.1, 3.0. The theoretical predictions are given by Equation ( 18 1 
networks is c = 0.068, 0.004. 
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For 7 — 2.1, 3.0 the average clustering in the simulated 



SO that the probability that node s attracts a link from node t is again 



P(s,t) e-'-^W 

' — 777- 

Pit) //e-'Wdi ■■7;(i)-^dz 



As before, this means that the degree distribution is a power law with exponent 7 = 1 + ^, but clustering is zero since 
T — >■ 00. Figure SIO shows simulation results validating our analysis. 

If we now let /3 — > 0, then 7 — > 00, and the generated networks degenerate to growing random graphs. Indeed, 
ii /3 — 0, rs{t) = Tt, ys,t, i.e., all node pairs have the same popularity as they all lie on the circle of the maximum 
radius r^, expanding with time. It is easy to check that the attraction probability is now m^^^jy = ^ ~ ™, Vs,i. 
This probability is similar to the connection probability in classical random graphs Gn,p [SS], where each N{N — l)/2 
pair of N nodes is connected with the same probability p ~ k/N. The difference is that our graphs are growing, 
which affects their properties including the degree distribution. The degree distribution in these growing graphs is 
exponential [57\ . versus the Poissonian distribution in classical random graphs. 

The limit /? — )■ (7 — >■ 00) also exists at low temperatures T e [0, 1) with finite clustering controlled by T. In this 
case, we can check that the attraction probability is still Vs, t, as all nodes are equally popular, but clustering is not 
zero, as similarity (the angular distance between nodes) matters. When T = we have the strongest clustering, and 
the generated networks degenerate to growing random geometric graphs on the circle. Indeed, we see from Equation 
([s]) that since any two nodes s,t have the same radial coordinate rt, they are connected only if the distance between 
them on the circle is less than a constant that depends on t, i.e., t connects to s only if 6st < 2e~^'^^'^~-^*'> = ~ 

In equilibrium geometric networks |47| . the connections to PA, growing random graphs, and growing random 
geometric graphs, are, respectively, the connections to the soft configuration model (random graphs with a given 
expected degree distribution), classical random graphs, and random geometric graphs. 



VIII. EXTENSION WITH INTERNAL LINKS 



While in some real networks, e.g., citation networks, new connections appear only from new to old nodes, in some 
other networks, new links may connect pairs of old, previously disconnected nodes. These links are called internal, 
versus external links of the previous type. Examples of networks with internal links include the Internet, were existing 
disconnected ASs may decide to connect at some point, and social networks were existing disconnected individuals 
may become friends or collaborators. Our geometric optimization framework can be easily extended to account for 
internal links as we show below. 

At each time t, in addition to the m external links introduced by new node t (e.g., using Model2), L internal 
connections are also created between existing disconnected pairs of nodes. Specifically, a random pair of existing nodes 
i,j < t is selected, and then connected (given that it is disconnected) with probability pixij) = 1/[1 + e'^^'J"''^*'/"^]. 
The step is repeated until L internal links are created. This procedure is exactly the same as the procedure by which 
a new node t connects to existing nodes in Model2. The average degree is now k — 2(m + L). 
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Following exactly the sam e procedure as in Section IV A[ and considering any value of hyperbolic space curvature 
K = —C,^ , ( > (Section VI), the probability that existing nodes i,j are selected and connected at time t is 
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^2 sinTn X{i,j,t)' 



where X{i,j,t) — 



(55) 



The probability that a pair of existing nodes gets connected at time i is | Jj* P{i, j,t)didj . Since L internal links 
are introduced, the probability that pair z, j attracts a link is 



Il{i,j,t) = 2L- 



Therefore, the probability that node s < t attracts an internal link at time t is 



n 



internal 



e 2 



,{t) 



= 2L- 



~/3 



\i{sA,t)dl = 2L- . r , . — , a I 



(56) 



(57) 



which is similar to the probability that node s attracts an external link, with the only difference that here we have 
the prefactor 2L instead of m, see Equation (24). Thus, the total probability that node s attracts a link at time t is 



the probability that the node attracts an external or an internal link 
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total 



(s,t) = (m + 2i) — 



The average degree of node s by time t is now given by 



= (fc — m) 
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where A' — [k — m){'y — 2). Equation ([59| is similar to Equation (15), and is identical to it if L = (i.e., if fc = 2m). 
From Equations (l58| and ( 59 ) we see that a node of degree k attracts a new link at time t with probability 



n(fc) = {k-m) 



k~ m + A' 
(k-m + A')t' 



(60) 



The link attraction probabilities in Equations ( 60 ) and 



2 > 0, Equation 

(|60|) gives approximately the same probability as Equation (|19|) for sufficiently large k, i.e., for k > 



are identical if L = 0. If L 



k-2r. 



|, and the 



absolute difference between the two probabilities is j. This observation implies that for a target k and /3 = the 
degree distributions in both cases are nearly identical, and indistinguishable from the degree distribution in networks 
growing according to standard PA. However, while internal links do not affect the degree distribution, they can affect 
other topological characteristics, e.g., they can decrease the average distance in the network. We study topological 
characteristics of networks growing according to popularity x similarity optimization with internal links in the next 
section. 

We conclude this section with some additional notes. First, from the analysis above we see that, similar to external 
links, PA appears as an emergent effect in the internal link attraction probability as well since a node attracts 
an internal link with probability which is also proportional to its current degree. Second, temperature T has the 
same effect on internal connections as on external connections, i.e., smaller values of T increase the probability that 
hyperbolically close disconnected node pairs get connected, which increases clustering. Finally, the model extension 
with internal links can be combined with the fitness model extension, described in Section [Vj as the former does 
not depend on whether nodes are moving with the same speeds or not. In this combination Equation (57) becomes 

ff"*<=™i(s,<) = 2L e- 4 '-=(*)//^*e-^ = 2Li^, (f)"^^"'^ 
a straightforward analysis as above can be applied. 



di, which is similar to Equation (40), and 



IX. PROPERTIES OF REAL- WORLD VERSUS MODELED NETWORKS 

In this section we compare several important properties of the real-world networks considered in Section |l] to the 
properties of modeled networks growing according to popularity x similarity optimization. Specifically, we consider 
the following properties: 
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FIG. Sll: Properties of the AS Internet vs. networks grown according to popularityxsimilarity optimization. The plots show: 
(a) the degree distribution P{k); (b) the average clustering c{k) of fc-degree nodes; (c) the average neighbor degree knn{k) of 
fc-degree nodes; (d) the distance distribution d{l); and (e) the average node betweenness B{k) of fe-degree nodes. 



(a) degree distribution P{k); 

(b) average clustering c{k) of fc-degree nodes; 

(c) average degree of neighbors knn{k) of fc-degree nodes; 

(d) distance distribution d{l), i.e., the distribution of hop lengths / of shortest paths between nodes in the network, 
or the probability that a random pair of nodes are at the distance of I hops from each other; 

(e) average node betweenness B{k) of fc-degree nodes, which is the average number of shortest paths passing through 
a fc-degree node, normalized by the maximum possible number of such paths. 

Property (c) captures degree correlations in the network. If fc„n(fc) is an increasing function, then high (low) degree 
nodes connect, on average, to nodes of high (low) degree, and the network is called assortative. Otherwise, nodes 
of high degree tend to connect to nodes of low degree, and the network is called disassortative. Technological and 
biological networks are usually disassortative, while social networks are usually assortative [Sll^. Properties (a-c) 
are local statistics reflecting properties of individual nodes and their one-hop neighborhoods, as opposed to global 
properties (d-e) which depend on large-scale organization of the network. 



A. Internet 



We take the Archipelago AS Internet topology of June 2009 from Section I A and compute properties (a)...(e) 



from above. The network consists of i = 23748 nodes, and has 7 — 2.1, k ^ 5, c — 0.61. Then we grow a network 
according to the popularityxsimilarity model (Model2') up to the same number of nodes as in the real AS Internet, 
and with the same 7, fc and c. We compute the same properties in the resulting network, and compare them to those 
of the real Internet. The results are shown in Fig. Sll, where we observe a good match between the properties of 
the modeled network and real Internet. This match is even better if we allow for internal connections as described 
in Section [VIII| In this case, each new node connects on average to m = 1.5 existing nodes, and at each time L = 1 
existing disconnected pairs of nodes are connected so that k — 2{m + L) = 5. With no internal links, L = Q and 
TO = fc/2 = 2.5. 
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FIG. S12: Properties of the E.coli metabolic network vs. networks grown according to popularity x similarity optimization. The 
plots show: (a) the degree distribution P(k); (b) the average clustering c(fc) of fc-degree nodes; (c) the average neighbor degree 
k„n{k) of fc-degree nodes; (d) the distance distribution d{l); and (e) the average node betweenness B{k) of fc-degree nodes. 



B. E.coli metabolic network 



Here we consider the entire network of metabolites from Section IB and compute properties (a). ..(e) for it. Recall 
that the network consists oi t — 1010 nodes, and has 7 = 2.5, k = 6.5, c = 0.48. We grow a network according to 
the popularity X similarity model (Model2') up to the same number of nodes as in the metabolic network, and with 
the same 7, k, and c. We use m — k/2 = 3.25. We compute the same network properties in the resulting network, 
and compare them to those of the real metabolic network. The results are shown in Fig. S12, where we observe a 
remarkable match across all five properties. 



PGP web of trust 



We now take the PGP web of trust snapshot of April 2003 from Section IC and compute its properties (a). ..(e). 
The network consists oi t = 14367 nodes, and has k — 5.3, c — 0.47. Its degree distribution is shown in Fig. S13(a), 
where we observe deviations from a clean power law. 

This observation motivates us to grow a modeled network using the fitness model extension in Section |Vj i.e.. 
Models, which can model non-power-law degree distributions. Recall that in Models, nodes s, s < t, move with 
different speeds, increasing their radial coordinate according to rs(t) = /3(rig)rs -|- (1 — P(r]s))rt — In ^' , where 
PiVs) ^ Vs, and 77s is the fitness of s. To grow a network according to this model, we need to know l3(ris), Vs < t. 



Given that 



we can find (3{ris) by solving 



r,it) = P{7]s)rs + (1 - PiVs))rt - In 



(61) 



since we know rt = Ini, have rs{t) inferred in Section |TTj and can infer as follows. We assume that nodes with 
smaller current radial coordinates were born earlier, and sort them in the increasing order, thus creating a sequence 
of current inferred radial coordinates ri{t),r2{t), ...,rt{t) for nodes born at times s — 1,2, Nodes for which the 
current radial coordinate is the same, are assumed to have appeared at the same time. Using — Ins, and setting 
I3{r]max) = 1, we have all the ingredients to solve Equation (61 ) for I3{r]s) for every node s = 1, 2, ...,t. 

Another peculiarity of the PGP network, compared to the networks considered earlier, is a deviation of the dis- 
tribution of the inferred angular distances between nodes from the uniform distribution: see Fig. S14 showing these 
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FIG. S13; Properties of the PGP web of trust vs. networks grown according to popularity x similarity optimization. The plots 
show: (a) the degree distribution P{k); (b) the average clustering c{k) of fc-degree nodes; (c) the average neighbor deg ree knn (/^) 
of fc-degree nodes; (d) the distance distribution d{l); and (e) the average node betweenness B{k) of fc-degree nodes. 
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FIG. S14: Distribution of the inferred angular distances (in radians) in the AS Internet (plot (a)), E.coli metabolic network 
(plot (b)), and PGP web of trust (plot (c)). In each case, we first sort the nodes in the increasing order of their inferred radial 
coordinates, so that the first i nodes are the nodes with the i smallest radial coordinates, and then compute the distribution of 
the angular distances for these first i nodes, using bins of size S — 0.1. We vary i from small values up to the total number of 
nodes in the network. The straight line in each plot is the uniform distribution p{9) — 5/it. 



distributions for all the considered real networks. In the PGP network, nodes with small radial coordinates are, on 
average, at smaller angular distances than what the uniform distribution suggests. Therefore in growing the modeled 
PGP network, we use the inferred angular coordinate 9s for every node s = 1,2, even though our analysis in 
Section |V| assumes a uniform angular distance distribution. 

Figure S13 juxtaposes properties (a). ..(e) of a network grown according to Models up to i = 14367 nodes using the 
inferred /3(?7s)'s and 6*5 's, temperature T = 0.2, m — 1, and L — 1.65 (k = 2(m + L) — 5.3), against the corresponding 
properties of the real PGP snapshot. As with the AS Internet and E.coli metabolic network, we also observe a good 
match between the modeled and real PGP web of trust across all these properties. 

To summarize this section, synthetic networks growing according to popularity x similarity optimization repro- 
duce several important structural characteristics of real technological, biological, and social networks. Remarkably, 
this optimization approach can capture the properties of both disassortative (Figs. Sll(c), S12(c)) and assortative 
(Fig. S13(c)) networks, as well as networks with degree distributions deviating from clean power laws (Fig. S13(a) vs. 
Figs. Sll(a), S12(a)). 
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X. RELATED WORK 
A. Optimization 

The work that comes perhaps closest to our approach is by D'Souza et al. [T3J [35] ■ In this work the authors 
show that PA can emerge in a tradeoff optimization framework requiring only local information. The framework is 
motivated by how connections in the Internet may take place. Specifically, the motivation is that a new AS may want 
to establish connections that would minimize the startup costs, while still providing good performance to its users. 
In the model, a new node is placed on the unit interval where distances abstract the connection fibre costs, and then 
connects to an existing node minimizing a balance between these costs and the shortest path hop-lengths to the core 
in the network, the latter abstracting performance in terms of the average delay from the new node to the rest of the 
network. The authors then focus only on the degree distribution in the graphs produced by this model, showing that 
with a specific fit of parameters, it matches well the degree distribution of the Internet extracted from the WHOIS 
data. The basic model studied in this work generates trees, since each incoming node connects to m = 1 existing 
nodes, but the authors suggest at the end that for m > 1 the model may lead to some non-zero clustering. 

B. PA-|-similarity information 

The fact that similarity between nodes affects the linking probability in networks has been observed, studied, and 
modeled extensively in the literature p!7H^ [5M5T] . Of particular interest are the works by Menczer [231 [H] where 
he introduces a model for text corpora with linking probability that augments standard PA with document similarity 
measures. The latter can be the standard cosine similarity for a pair of documents, defined by the normalized 
count of words common to both documents. The author then shows that this model describes well the degree and 
similarity distributions in the DMOZ Web data and in a collection of articles published in PNAS. In [33] he also shows 
that similarity information can help to improve Web navigation, an observation confirmed later in a more abstract 
context TE\, where similarity is modeled by distances on the unit interval. In f61' a modification of the model of j2_4j 
is proposed where the linking probability is proportional to the product of the degrees of the documents and their 
cosine similarity. The authors then show that this model can describe the clustering coefficient in document networks 
better compared to 124j. In |60j similarity attributes are modeled by vectors in an n-dimensional space. A new node 
first selects a certain group of existing nodes (community) based on similarity distances between the new and existing 
nodes. Within the community the attachment then follows standard PA. That is, this model also augments PA with 
similarity. The authors conclude by showing that the model generates graphs with power-law degree distributions 
and exponent 7 = 3, and some community structure. No real networks are considered. 

C. PA-|-spatial information 

A wider class of models augment PA not with similarity information per se, but with some spatial information |62l - 
l64] , see also Section 4.4 in [65] . In these models, nodes are located in some space, and the linking probability depends 
not only on node degrees as in standard PA, but also on distances between nodes in the space. If this linking 
probability decreases with the spatial distance fast enough, then such models generate graphs with strong clustering 
for a very simple reason: since close nodes have high probability of being connected, then the triangle inequality in 
the space leads to a large number of triangles in the network. Yet the mechanism responsible for power-law degree 
distributions in these models is the same PA. 



D. Hidden variables 

Yet wider and more general class of models, to which our approach actually belongs, are the network models with 
hidden variables [TTl[5n]. In these models, some hidden variables are first assigned to nodes, and the linking probability 
between a pair of nodes is then a function of the values of their hidden variables. For example, in llj the authors 
show that a combination of exponentially distributed hidden variables and step-function connection probability leads 
to power-law degree distributions and strong clustering in modeled networks, while in [SD] it is shown that PA itself 
can be casted as a hidden variable model, where one of the hidden variables is the node birth time. 
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E. Clustering 

A variety of other mechanisms have been proposed to fix the zero-clustering problem with PA. One such mechanism 
is node activation/deactivation |66| motivated by citation networks. A set of m active nodes is maintained in the 
model, and the new, initially active node connects to this set by m links. One active node is then deactivated with 
probability inversely proportional to the node degree. Because of this inverse proportionality imposed by the model, 
the model effectively implements linear PA. Because the new connections are made to local groups of active nodes, 
clustering is strong. However, as shown in ! 67j the model is effectively one dimensional, lacking the small-world 
property observed in many real networks. 

Another popular mechanism enforcing strong clustering is random walks [121 168j . A new node connects first to 
a random existing node, and then with some probability to one of its neighbors, and possibly to a neighbor of its 
neighbor, etc. Clustering is strong because the connections are concentrated in a local neighborhood of the attachment 
node. 



F. Emergent PA 

The lack of clustering is not the only problem with standard PA. Another problem is that PA per se is simply 
impossible in a vast majority of real networks because to "implement" PA, the network evolution process must 
"know" the global current structure of the whole network in order to compute the degree for each node. Since such 
knowledge is often unavailable in reality, PA must be an emergent phenomenon, i.e., an effective result of some other 
underlying evolution processes that use only local information. Yet another related problem is that such processes 
must lead to exactly linear PA, since if the attachment probability is not a linear function of node degree, then the 
degree distribution in the network is not a power law jB] . Several mechanisms have been proposed to address these two 
problems as well. The aforementioned random walks, for example, do solve them both because the probabilities of the 
stationary distribution of a random walk on a graph are linearly proportional to node degrees. Another interesting 
observation was made in |14j where the authors show that connections based solely on node ranking may lead to power 
laws, the motivation being that node ranking is a coarser proxy to popularity than the node degree. Yet the simplest 
and perhaps the first model that addresses the three mentioned concerns with PA — zero-clustering, global knowledge, 
and linearity — is by Dorogovtsev et al. [9j: the new node simply selects a random existing link, and connects to its 
both ends. Clustering is obviously strong, and linear PA is resurrected because the probability that a random link is 
attached to a node of degree k is proportional to k. However, this model is clearly a toy model, and there have been 
no attempts to validate it against any real networks. 

G. Discussion 

As far as validation is concerned, the model validation methodology is usually limited to generating synthetic graphs 
according to the model prescription, and comparing one or more of their structural properties, such as the degree 
distribution, against those in real networks. Remarkably, the core of the network evolution mechanism proposed by 
a model is quite rarely validated directly, because such validation is either difficult or impossible. In similarity-based 
models, for example, such validation is difficult because there are too many different similarity measures, and it is 
usually unclear which one should be used in which case [351 US] i so that cases where model predictions are validated 
directly against real-world similarity data [331 [211 EI] are rare, and usually limited to specific (types of) networks. 

Within our approach, the direct validation of the network evolution mechanism is also difficult but possible. It is 
possible because we can infer the node coordinates in the generic similarity space as discussed in Section [IT] and then 
check if the linking probability in real networks as a function of distances between nodes in this space is close to our 
model predictions, see Fig. 3. 

In summary, the salient feature of our approach is that it simultaneously: 

1. shows that similarity plays an important and fundamental role in evolution of complex networks; 

2. does so by means of a very simple and general geometric model; 

3. admits a complete analytic treatment; 

4. directly validates the modeled similarity mechanism and its analytic predictions against drastically different real 
networks from different domains; 

5. reproduces many important structural properties of these networks; and 
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6. resolves all the mentioned concerns with preferential attachment, which appears in the approach as an emergent 
phenomenon. 
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